# for QR codes use inline
# %matplotlib inline
# qr_setting = 'url'
# qrviz_setting = 'show'
#
# for lecture use notebook
%matplotlib inline
qr_setting = None
#
%config InlineBackend.figure_format='retina'
# import libraries
import numpy as np
import matplotlib as mp
import pandas as pd
import matplotlib.pyplot as plt
import laUtilities as ut
import slideUtilities as sl
import demoUtilities as dm
import pandas as pd
from importlib import reload
from datetime import datetime
from matplotlib import animation
from IPython.display import Image
from IPython.display import display_html
from IPython.display import display
from IPython.display import Math
from IPython.display import Latex
from IPython.display import HTML;
# Animation Notes
#
from matplotlib import animation
#
# two methods for animation - see:
# http://louistiao.me/posts/notebooks/embedding-matplotlib-animations-in-jupyter-as-interactive-javascript-widgets/
# 1 - directly inline via %matplotlib notebook - advantage: real-time, disadvantage: no manual control
# 2 - encode as video, embedded into notebook (base64) and display via notebook player or JS widget
# downside - not real time, need to generate complete video before it can be displayed
# 2a - HTML(anim.to_html5_video()) [notebook player, bare bones] [uses html5 <video> tag]
# 2b - HTML(anim.to_jshtml()) [JS widget, extra controls]
# these latter can be made default representation via next line, which can also be 'html5'
mp.rcParams['animation.html'] = 'jshtml'
# 3 - save as an animated GIF and display using
# rot_animation.save('rotation.gif', dpi=80, writer='imagemagick')
# 
# see also https://stackoverflow.com/questions/35532498/animation-in-ipython-notebook/46878531#46878531
Symmetric Matrices¶
fig = ut.three_d_figure((24, 5),
'Intersection of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 with the constraint ||x|| = 1',
-2, 2, -2, 2, 0, 8,
equalAxes = False, figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
x = np.array([np.cos(angle), np.sin(angle)])
z = x.T @ qf @ x
fig.plotPoint(x[0], x[1], z, 'b')
fig.plotPoint(x[0], x[1], 0, 'g')
fig.plotQF(qf, alpha=0.5)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
# do not call fig.save here
Today we’ll study a very important class of matrices: symmetric matrices.
We’ll see that symmetric matrices have properties that relate to both eigendecomposition, and orthogonality.
Furthermore, symmetric matrices open up a broad class of problems we haven’t yet touched on: constrained optimization.
As a result, symmetric matrices arise very often in applications.
Definition. A symmetric matrix is a matrix \(\,A\) such that \(\;A^T = A\).
Clearly, such a matrix is square.
Furthermore, the entries that are not on the diagonal come in pairs, on opposite sides of the diagonal.
Example. Here are three symmetric matrices:
Here are three nonsymmetric matrices:
Orthogonal Diagonalization¶
At first glance, a symmetric matrix may not seem that special!
But in fact symmetric matrices have a number of interesting properties.
First, we’ll look at a remarkable fact:
Example. Diagonalize the following symmetric matrix:
Solution.
The characteristic equation of \(A\) is
So the eigenvalues are 8, 6, and 3.
We construct a basis for each eigenspace (using our standard method of finding the nullspace of \(A-\lambda I\)):
These three vectors form a basis for \(\mathbb{R}^3.\)
But here is the remarkable thing: these three vectors form an orthogonal set.
That is, any two eigenvectors are orthogonal.
For example,
As a result, \(\{\mathbf{v}_1, \mathbf{v}_2, \mathbf{v}_3\}\) form an orthogonal basis for \(\mathbb{R}^3.\)
Let’s normalize these vectors so they each have length 1:
The orthogonality of the eigenvectors of a symmetric matrix is quite important.
To see this, let’s write the diagonalization of \(A\) in terms of these eigenvectors and eigenvalues:
Then, \(A = PDP^{-1},\) as usual.
But, here is the interesting thing:
\(P\) is square and has orthonormal columns.
So \(P\) is an orthogonal matrix.
So, that means that \(P^{-1} = P^T.\)
So, \(A = PDP^T.\)
The Spectral Theorem¶
Here is a theorem that shows that we always get orthogonal eigenvectors when we diagonalize a symmetric matrix:
Theorem. If \(A\) is symmetric, then any two eigenvectors of \(A\) from different eigenspaces are orthogonal.
Proof.
Let \(\mathbf{v}_1\) and \(\mathbf{v}_2\) be eigenvectors that correspond to distinct eigenvalues, say, \(\lambda_1\) and \(\lambda_2.\)
To show that \(\mathbf{v}_1^T\mathbf{v}_2 = 0,\) compute
So we conclude that \(\lambda_1(\mathbf{v}_1^T\mathbf{v}_2) = \lambda_2(\mathbf{v}_1^T\mathbf{v}_2).\)
But \(\lambda_1 \neq \lambda_2,\) so this can only happen if \(\mathbf{v}_1^T\mathbf{v}_2 = 0.\)
So \(\mathbf{v}_1\) is orthogonal to \(\mathbf{v}_2.\)
The same argument applies to any other pair of eigenvectors with distinct eigenvalues.
So any two eigenvectors of \(A\) from different eigenspaces are orthogonal.
We can now introduce a special kind of diagonalizability:
An \(n\times n\) matrix is said to be orthogonally diagonalizable if there are an orthogonal matrix \(P\) (with \(P^{-1} = P^T\)) and a diagonal matrix \(D\) such that
Such a diagonalization requires \(n\) linearly independent and orthonormal eigenvectors.
When is this possible?
If \(A\) is orthogonally diagonalizable, then
So \(A\) is symmetric!
That is, whenever \(A\) is orthogonally diagonalizable, it is symmetric.
It turns out the converse is true (though we won’t prove it).
Hence we obtain the following important theorem:
Theorem. An \(n\times n\) matrix is orthogonally diagonalizable if and only if it is a symmetric matrix.
Remember that when we studied diagonalization, we found that it was a difficult process to determine if an arbitrary matrix was diagonalizable.
But here, we have a very nice rule: every symmetric matrix is (orthogonally) diagonalizable.
Quadratic Forms¶
An important role that symmetric matrices play has to do with quadratic expressions.
Up until now, we have mainly focused on linear equations – equations in which the \(x_i\) terms occur only to the first power.
Actually, though, we have looked at some quadratic expressions when we considered least-squares problems.
For example, we looked at expressions such as \(\Vert x\Vert^2\) which is \(\sum x_i^2.\)
We’ll now look at quadratic expressions generally. We’ll see that there is a natural and useful connection to symmetric matrices.
Definition. A quadratic form is a function of variables, eg, \(x_1, x_2, \dots, x_n,\) in which every term has degree two.
Examples:
\(4x_1^2 + 2x_1x_2 + 3x_2^2\) is a quadratic form.
\(4x_1^2 + 2x_1\) is not a quadratic form.
Quadratic forms arise in many settings, including signal processing, physics, economics, and statistics.
In computer science, quadratic forms arise in optimization and graph theory, among other areas.
Essentially, what an expression like \(x^2\) is to a scalar, a quadratic form is to a vector.
Fact. Every quadratic form can be expressed as \(\mathbf{x}^TA\mathbf{x}\), where \(A\) is a symmetric matrix.
There is a simple way to go from a quadratic form to a symmetric matrix, and vice versa.
To see this, let’s look at some examples.
Example. Let \(\mathbf{x} = \begin{bmatrix}x_1\\x_2\end{bmatrix}.\) Compute \(\mathbf{x}^TA\mathbf{x}\) for the matrix \(A = \begin{bmatrix}4&0\\0&3\end{bmatrix}.\)
Solution.
Example. Compute \(\mathbf{x}^TA\mathbf{x}\) for the matrix \(A = \begin{bmatrix}3&-2\\-2&7\end{bmatrix}.\)
Solution.
Notice the simple structure: $\(a_{11} \mbox{ is multiplied by } x_1 x_1\)\( \)\(a_{12} \mbox{ is multiplied by } x_1 x_2\)\( \)\(a_{21} \mbox{ is multiplied by } x_2 x_1\)\( \)\(a_{22} \mbox{ is multiplied by } x_2 x_2\)$
We can write this concisely:
Example. For \(\mathbf{x}\) in \(\mathbb{R}^3\), let
Write this quadratic form \(Q(\mathbf{x})\) as \(\mathbf{x}^TA\mathbf{x}\).
Solution.
The coefficients of \(x_1^2, x_2^2, x_3^2\) go on the diagonal of \(A\).
Based on the previous example, we can see that the coefficient of each cross term \(x_ix_j\) is the sum of two values in symmetric positions on opposite sides of the diagonal of \(A\).
So to make \(A\) symmetric, the coefficient of \(x_ix_j\) for \(i\neq j\) must be split evenly between the \((i,j)\)- and \((j,i)\)-entries of \(A\).
You can check that
Question Time! Q24.1
Classifying Quadratic Forms¶
Notice that \(\mathbf{x}^TA\mathbf{x}\) is a scalar.
In other words, when \(A\) is an \(n\times n\) matrix, the quadratic form \(Q(\mathbf{x}) = \mathbf{x}^TA\mathbf{x}\) is a real-valued function with domain \(\mathbb{R}^n\).
Here are four quadratic forms with domain \(\mathbb{R}^2\).
Notice that except at \(\mathbf{x}=\mathbf{0},\) the values of \(Q(\mathbf{x})\) are all positive in the first case, and all negative in the last case.
fig = ut.three_d_figure((24, 1), 'z = 3 x1^2 + 7 x2 ^2',
-2, 2, -2, 2, -1, 3,
figsize = (7, 7), qr = qr_setting)
plt.close()
qf = np.array([[3., 0.],[0., 7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 + 7x_2^2$', 'z = 3x_1^2 + 7x_2^2', size = 18)
fig.save()
#
def anim(frame):
fig.ax.view_init(azim = frame, elev = 22)
# fig.canvas.draw()
#
# create and display the animation
HTML(animation.FuncAnimation(fig.fig, anim,
frames = 5 * np.arange(72),
fargs = None,
interval = 100).to_jshtml(default_mode = 'loop'))
fig = ut.three_d_figure((24, 2), 'z = 3 x1^2',
-2, 2, -2, 2, -1, 3,
figsize = (7, 7), qr = qr_setting)
plt.close()
qf = np.array([[3., 0.],[0., 0.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2$', 'z = 3x_1^2', size = 18)
fig.save()
# create and display the animation
HTML(animation.FuncAnimation(fig.fig, anim,
frames = 5 * np.arange(72),
fargs = None,
interval = 100).to_jshtml(default_mode = 'loop'))
fig = ut.three_d_figure((24, 3), 'z = 3 x1^2 - 7 x2 ^2',
-2, 2, -2, 2, -1, 3,
figsize = (7, 7), qr = qr_setting)
plt.close()
qf = np.array([[3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 - 7x_2^2$', 'z = 3x_1^2 - 7x_2^2', size = 18)
fig.save()
# create and display the animation
HTML(animation.FuncAnimation(fig.fig, anim,
frames = 5 * np.arange(72),
fargs = None,
interval = 100).to_jshtml(default_mode = 'loop'))
fig = ut.three_d_figure((24, 4), 'z = -3 x1^2 - 7 x2 ^2',
-2, 2, -2, 2, -3, 1,
figsize = (7, 7), qr = qr_setting)
plt.close()
qf = np.array([[-3., 0.],[0., -7.]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = -3x_1^2 - 7x_2^2$', 'z = -3x_1^2 - 7x_2^2', size = 18)
fig.save()
# create and display the animation
HTML(animation.FuncAnimation(fig.fig, anim,
frames = 5 * np.arange(72),
fargs = None,
interval = 100).to_jshtml(default_mode = 'loop'))
The differences between these surfaces is important for problems such as optimization.
In an optimization problem, one seeks the minimum or maximum value of a function (perhaps over a subset of its domain).
Definition. A quadratic form \(Q\) is:
positive definite if \(Q(\mathbf{x}) > 0\) for all \(\mathbf{x} \neq 0.\)
positive semidefinite if \(Q(\mathbf{x}) \geq 0\) for all \(\mathbf{x} \neq 0.\)
indefinite if \(Q(\mathbf{x})\) assumes both positive and negative values.
negative definite if \(Q(\mathbf{x}) < 0\) for all \(\mathbf{x} \neq 0.\)
Classifying Quadratic Forms¶
Knowing which kind of quadratic form one is dealing with is important for optimization.
Consider these two quadratic forms:
Notice that that matrices differ in only one position.
Let’s say we would like to find
the minimum value of \(P(\mathbf{x})\) over all \(\mathbf{x}\), or
the minimum value of \(Q(\mathbf{x})\) over all \(\mathbf{x}\)
In each case, we need to ask: is it possible? Does a minimum value even exist?
fig = ut.three_d_figure((24, 5), 'P(x)',
-5, 5, -5, 5, -5, 5,
figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 2],[2, 1]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = \mathbf{x}^TA\mathbf{x}$', 'z = x^TAx', size = 18)
fig.ax.view_init(azim = 30)
fig.save();
fig = ut.three_d_figure((24, 6), 'Q(x)',
-5, 5, -5, 5, -5, 5,
figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 2],[2, 3]])
fig.plotQF(qf, alpha = 0.7)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = \mathbf{x}^TB\mathbf{x}$', 'z = x^TBx', size = 18)
fig.save();
Clearly, we cannot minimize \(\mathbf{x}^TA\mathbf{x}\) over all \(\mathbf{x}\),
… but we can minimize \(\mathbf{x}^TB\mathbf{x}\) over all \(\mathbf{x}\).
How can we distinguish between these two cases in general?
That is, in high dimension, without the ability to visualize?
There is a remarkably simple way to determine, for any quadratic form, which class it falls into.
Theorem. Let \(A\) be an \(n\times n\) symmetric matrix.
Then a quadratic form \(\mathbf{x}^TA\mathbf{x}\) is
positive definite if and only if the eigenvalues of \(A\) are all positive.
positive semidefinite if and only if the eigenvalues of \(A\) are all nonnegative.
indefinite if and only if \(A\) has both positive and negative eigenvalues.
negative definite if and only if the eigenvalues of \(A\) are all negative.
Proof.
A proof sketch for the postive definite case.
Let’s consider \(\mathbf{u}_i\), an eigenvector of \(A\). Then
If all eigenvalues are positive, then all such terms are positive.
Since \(A\) is symmetric, it is diagonalizable and so its eigenvectors span \(\mathbb{R}^n\).
So any \(\mathbf{x}\) can be expressed as a weighted sum of \(A\)’s eigenvectors.
Writing out the expansion of \(\mathbf{x}^TA\mathbf{x}\) in terms of \(A\)’s eigenvectors, we get only positive terms.
Example. Let’s look at the four quadratic forms above. Their associated matrices are
Example. Is \(Q(\mathbf{x}) = 3x_1^2 + 2x_2^2 + x_3^2 + 4x_1x_2 + 4x_2x_3\) positive definite?
Solution. Because of all the plus signs, this form “looks” positive definite. But the matrix of the form is
and the eigenvalues of this matrix turn out to be 5, 2, and -1.
So \(Q\) is an indefinite quadratic form.
Question Time! Q24.2
Constrained Optimization¶
A common kind of optimization is to find the maximum or the minimum value of a quadratic form \(Q(\mathbf{x})\) for \(\mathbf{x}\) in some specified set.
For example, a common constraint is that \(\mathbf{x}\) varies over the set of unit vectors.
This is called constrained optimization.
While it can be a difficult problem in general, for quadratic forms it has a particularly elegant solution.
The requirement that a vector \(\mathbf{x}\) in \(\mathbb{R}^n\) be a unit vector can be stated in several equivalent ways:
Here is an example problem:
Let’s visualize this problem.
Here is the quadratic form \(z = 3x_1^2 + 7x_2^2\):
fig = ut.three_d_figure((24, 7), 'z = 3 x1^2 + 7 x2 ^2',
-2, 2, -2, 2, 0, 8,
equalAxes = False,
figsize = (7, 7), qr = qr_setting)
qf = np.array([[3., 0.],[0., 7.]])
fig.plotQF(qf, alpha=0.6)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'$z = 3x_1^2 + 7x_2^2$', 'z = 3x_1^2 + 7x_2^2', size = 18)
fig.save();
Now the set of all vectors \(\Vert\mathbf{x}\Vert = 1\) is a circle.
We plot this circle in the \((x_1, x_2)\) plane in green,
and we plot the value that \(z\) takes on for those points in blue.
fig = ut.three_d_figure((24, 8),
'Intersection of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 with the constraint ||x|| = 1',
-2, 2, -2, 2, 0, 8,
equalAxes = False,
figsize = (7, 7), qr = qr_setting)
plt.close()
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
x = np.array([np.cos(angle), np.sin(angle)])
z = x.T @ qf @ x
fig.plotPoint(x[0], x[1], z, 'b')
fig.plotPoint(x[0], x[1], 0, 'g')
fig.plotQF(qf, alpha=0.5)
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
fig.set_title(r'Intersection of quadratic form $z = 3 x_1^2 + 7 x_2^2$ and $\Vert \mathbf{x}\Vert = 1$',
'Intersection of quadratic form z = 3 x_1^2 + 7 x_2^2 and ||x|| = 1',
size = 18)
fig.save()
# create and display the animation
HTML(animation.FuncAnimation(fig.fig, anim,
frames = 5 * np.arange(72),
fargs = None,
interval = 100).to_jshtml(default_mode = 'loop'))
The set of blue points is the set \(z = 3x_1^2 + 7x_2^2\) such that \(\Vert \mathbf{x}\Vert = 1\).
Now, we can see the geometric sense of a constrained optimization problem.
In particular, we can visualize the solutions to two constrained optimizations:
and
fig = ut.three_d_figure((24, 9),
'Maximum and minimum of the positive definite quadratic form z = 3 x1^2 + 7 x2 ^2 subject to ||x|| = 1',
-2, 2, -2, 2, 0, 8,
equalAxes = False,
figsize = (7, 7), qr = qr_setting)
plt.close()
qf = np.array([[3., 0.],[0., 7.]])
for angle in np.linspace(0, 2*np.pi, 200):
x = np.array([np.cos(angle), np.sin(angle)])
z = x.T @ qf @ x
fig.plotPoint(x[0], x[1], z, 'b')
#
max_pt = np.array([0., 1.])
max_pt_z = max_pt.T @ qf @ max_pt
fig.plotPoint(max_pt[0], max_pt[1], max_pt_z, 'r')
fig.text(max_pt[0]+.2, max_pt[1]+.2, max_pt_z+.2, '(0, 1, 7)', '(0, 1, 7)', 14)
min_pt = np.array([1., 0.])
#
min_pt_z = min_pt.T @ qf @ min_pt
fig.plotPoint(min_pt[0], min_pt[1], min_pt_z, 'r')
fig.text(min_pt[0]+.2, min_pt[1]+.2, min_pt_z-.8, '(1, 0, 3)', '(1, 0, 3)', 14)
#
fig.ax.set_zlabel('z')
fig.desc['zlabel'] = 'z'
#
fig.set_title(r'Max and Min of $z = 3 x_1^2 + 7 x_2^2$ s.t. $\Vert x\Vert = 1$',
'Max and Min of z = 3 x_1^2 + 7 x_2^2 s.t. ||x|| = 1',
size = 18)
fig.save()
# create and display the animation
HTML(animation.FuncAnimation(fig.fig, anim,
frames = 5 * np.arange(72),
fargs = None,
interval = 100).to_jshtml(default_mode = 'loop'))
Solving Constrained Optimization Over Quadratic Forms¶
When a quadratic form has no cross-product terms, it is easy to find the maximum and minimum of \(Q(\mathbf{x})\) for \(\mathbf{x}^T\mathbf{x} = 1.\)
Example. Find the maximum and minimum values of \(Q(\mathbf{x}) = 9x_1^2 + 4x_2^2 + 3x_3^2\) subject to the constraint \(\mathbf{x}^T\mathbf{x} = 1.\)
Since \(x_2^2\) and \(x_3^2\) are nonnegative, we know that
So
Whenever \(x_1^2 + x_2^2 + x_3^2 = 1.\) So the maximum value of \(Q(\mathbf{x})\) cannot exceed 9 when \(\mathbf{x}\) is a unit vector.
Furthermore, \(Q(\mathbf{x}) = 9\) when \(\mathbf{x}=(1,0,0).\)
Thus 9 is the maximum value of \(Q(\mathbf{x})\) for \(\mathbf{x}^T\mathbf{x} = 1\).
A similar argument shows that the minimum value of \(Q(\mathbf{x})\) when \(\mathbf{x}^T\mathbf{x}=1\) is 3.
Eigenvalues Solve Contrained Optimization¶
Observation.
Note that the matrix of the quadratic form in the example is
So the eigenvalues of \(A\) are \(9, 4,\) and \(3\).
We note that the greatest and least eigenvalues equal, respectively, the (constrained) maximum and minimum of \(Q(\mathbf{x})\).
In fact, this is true for any quadratic form.
Theorem. Let \(A\) be a symmetric matrix, and let
Then \(M\) is the greatest eigenvalue \(\lambda_1\) of \(A\).
The value of \(Q(\mathbf{x})\) is \(\lambda_1\) when \(\mathbf{x}\) is the unit eigenvector corresponding to \(\lambda_1\).
A similar theorem holds for the constrained minimum of \(Q(\mathbf{x})\) and the least eigenvector \(\lambda_n\).
Question Time! Q24.3